NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Improved Offline Contextual Bandits with Second-Order Bounds: Betting and Freezing

Ryu, J Jon; Kwon, Jeongyeol; Koppe, Benjamin; Jun, Kwang-Sung (July 2025, Conference on Learning Theory (COLT))

Free, publicly-accessible full text available July 1, 2026
Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way

Kwon, Jeongyeol; Dotson, Luke; Chen, Yudong; Xie, Qiaomin (May 2025, Proceedings of Machine Learning Research)

Free, publicly-accessible full text available May 3, 2026
Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way

Kwon, Jeongyeol; Dotson, Luke; Chen, Yudong; Xie, Qiaomin (January 2025, International Conference on Artificial Intelligence and Statistics (AISTATS), 2025)

Free, publicly-accessible full text available January 31, 2026
On The Complexity of First-Order Methods in Stochastic Bilevel Optimization

Kwon, Jeongyeol; Kwon, Dohyun; Lyu, Hanbaek (July 2024, Proceedings of Machine Learning Research)

We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions $y^*(x)$ in response to the changes in the upper-level variables $$x$$. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call $y^*$-aware, that returns an $$O(\epsilon)$$-estimate of the lower-level solution, in addition to first-order gradient estimators {\it locally unbiased} within the $$\Theta(\epsilon)$$-ball around $y^*(x)$. We study the complexity of finding stationary points with such an $y^*$-aware oracle: we propose a simple first-order method that converges to an $$\epsilon$$ stationary point using $$O(\epsilon^{-6}), O(\epsilon^{-4})$$ access to first-order $y^*$-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by $$O(\epsilon)$$ with minimal assumptions. We then provide the matching $$\Omega(\epsilon^{-6})$$, $$\Omega(\epsilon^{-4})$$ lower bounds without and with an additional smoothness assumption on $y^*$-aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an $y^*$-aware oracle must suffer the same lower bounds.
more » « less
Full Text Available
On The Complexity of First-Order Methods in Stochastic Bilevel Optimization

Kwon, Jeongyeol; Kwon, Dohyun; Lyu, Hanbaek (July 2024, Proceedings of Machine Learning Research)

We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions $y^*(x)$ in response to the changes in the upper-level variables $$x$$. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call $y^*$-aware, that returns an $$O(\epsilon)$$-estimate of the lower-level solution, in addition to first-order gradient estimators locally unbiased within the $$\Theta(\epsilon)$$-ball around $y^*(x)$. We study the complexity of finding stationary points with such an $y^*$-aware oracle: we propose a simple first-order method that converges to an $$\epsilon$$ stationary point using $$O(\epsilon^{-6}), O(\epsilon^{-4})$$ access to first-order $y^*$-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by $$O(\epsilon)$$ with minimal assumptions. We then provide the matching $$\Omega(\epsilon^{-6})$$, $$\Omega(\epsilon^{-4})$$ lower bounds without and with an additional smoothness assumption, respectively. Our results imply that any approach that simulates an algorithm with an $y^*$-aware oracle must suffer the same lower bounds.
more » « less
Full Text Available
Global Optimality of the EM Algorithm for Mixtures of Two-Component Linear Regressions

https://doi.org/10.1109/TIT.2024.3435522

Kwon, Jeongyeol; Qian, Wei; Chen, Yudong; Caramanis, Constantine; Davis, Damek; Ho, Nhat (September 2024, IEEE Transactions on Information Theory)

Full Text Available
On penalty methods for nonconvex bilevel optimization and first-order stochastic approximation

Kwon, Jeongyeol; Kwon, Dohyun; Wright, Stephen; Nowak, Robert (February 2024, ICLR 2024)

Full Text Available
On Penalty Methods for Nonconvex Bilevel Optimization and First-Order Stochastic Approximation

Kwon, Jeongyeol; Kwon, Dohyun; Wright, Stephen; Nowak, Robert D (January 2024, International Conference on Learning Representations)

In this work, we study first-order algorithms for solving Bilevel Optimization (BO) where the objective functions are smooth but possibly nonconvex in both levels and the variables are restricted to closed convex sets. As a first step, we study the landscape of BO through the lens of penalty methods, in which the upper- and lower-level objectives are combined in a weighted sum with penalty parameter . In particular, we establish a strong connection between the penalty function and the hyper-objective by explicitly characterizing the conditions under which the values and derivatives of the two must be -close. A by-product of our analysis is the explicit formula for the gradient of hyper-objective when the lower-level problem has multiple solutions under minimal conditions, which could be of independent interest. Next, viewing the penalty formulation as -approximation of the original BO, we propose first-order algorithms that find an -stationary solution by optimizing the penalty formulation with . When the perturbed lower-level problem uniformly satisfies the {\it small-error} proximal error-bound (EB) condition, we propose a first-order algorithm that converges to an -stationary point of the penalty function using in total accesses to first-order stochastic gradient oracles. Under an additional assumption on stochastic oracles, we show that the algorithm can be implemented in a fully {\it single-loop} manner, {\it i.e.,} with samples per iteration, and achieves the improved oracle-complexity of .
more » « less
Full Text Available
Feed Two Birds with One Scone: Exploiting Wild Data for Both Out-of-Distribution Generalization and Detection

Bai, Haoyue; Canal, Gregory; Du, Xuefeng; Kwon, Jeongyeol; Nowak, Robert D; Li, Yixuan (August 2023, International Conference on Machine Learning)

Modern machine learning models deployed in the wild can encounter both covariate and semantic shifts, giving rise to the problems of out-of-distribution (OOD) generalization and OOD detection respectively. While both problems have received significant research attention lately, they have been pursued independently. This may not be surprising, since the two tasks have seemingly conflicting goals. This paper provides a new unified approach that is capable of simultaneously generalizing to covariate shifts while robustly detecting semantic shifts. We propose a margin-based learning framework that exploits freely available unlabeled data in the wild that captures the environmental test-time OOD distributions under both covariate and semantic shifts. We show both empirically and theoretically that the proposed margin constraint is the key to achieving both OOD generalization and detection. Extensive experiments show the superiority of our framework, outperforming competitive baselines that specialize in either OOD generalization or OOD detection. Code is publicly available at https://github.com/deeplearning-wisc/scone.
more » « less
Full Text Available
The EM Algorithm gives Sample Optimality for Learning Mixtures of Well-Separated Gaussians

Kwon, Jeongyeol; Caramanis, Constantine (July 2020, Proceedings of Thirty Third Conference on Learning Theory)

We consider the problem of spherical Gaussian Mixture models with 𝑘≥3 components when the components are well separated. A fundamental previous result established that separation of Ω(log𝑘‾‾‾‾‾√) is necessary and sufficient for identifiability of the parameters with \textit{polynomial} sample complexity (Regev and Vijayaraghavan, 2017). In the same context, we show that 𝑂̃ (𝑘𝑑/𝜖2) samples suffice for any 𝜖≲1/𝑘, closing the gap from polynomial to linear, and thus giving the first optimal sample upper bound for the parameter estimation of well-separated Gaussian mixtures. We accomplish this by proving a new result for the Expectation-Maximization (EM) algorithm: we show that EM converges locally, under separation Ω(log𝑘‾‾‾‾‾√). The previous best-known guarantee required Ω(𝑘‾‾√) separation (Yan, et al., 2017). Unlike prior work, our results do not assume or use prior knowledge of the (potentially different) mixing weights or variances of the Gaussian components. Furthermore, our results show that the finite-sample error of EM does not depend on non-universal quantities such as pairwise distances between means of Gaussian components.
more » « less
Full Text Available

« Prev Next »

Search for: All records